Asymptotically optimal priority policies for indexable and non-indexable restless bandits
نویسندگان
چکیده
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property and a technicalcondition. We consider both a fixed population of bandits as well as a dynamic population wherebandits can depart and arrive. As an example of a dynamic population of bandits, we analyze amulti-class M/M/S+M queue for which we show asymptotic optimality of an index policy.We combine fluid-scaling techniques with linear programming results to prove that when banditsare indexable, Whittle’s index policy is included in our class of priority policies. We thereby generalizea result of Weber and Weiss (1990) about asymptotic optimality of Whittle’s index policy to settingswith (i) several classes of bandits, (ii) arrivals of new bandits, and (iii) multiple actions.Indexability of the bandits is not required for our results to hold. For non-indexable bandits wedescribe how to select priority policies from the class of asymptotically optimal policies and presentnumerical evidence that, outside the asymptotic regime, the performance of our proposed prioritypolicies is nearly optimal.
منابع مشابه
Restless Bandits, Partial Conservation Laws and Indexability
We show that if performance measures in a general stochastic scheduling problem satisfy partial conservation laws (PCL), which extend the generalized conservation laws (GCL) introduced by Bertsimas and Niño-Mora (1996), then the problem is solved optimally by a priority-index policy under a range of admissible linear performance objectives, with both this range and the optimal indices being det...
متن کاملAsymptotic optimal control of multi-class restless bandits
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is acontrollable process whose state evolution depends on whether or not the bandit is made active. Theaim is to find a control that determines at each decision epoch which bandits to make active in orderto minimize the overall average cost associated to the states the bandits are in. Sinc...
متن کاملWhen are Kalman-Filter Restless Bandits Indexable?
We study the restless bandit associated with an extremely simple scalar Kalman filter model in discrete time. Under certain assumptions, we prove that the problem is indexable in the sense that the Whittle index is a non-decreasing function of the relevant belief state. In spite of the long history of this problem, this appears to be the first such proof. We use results about Schur-convexity an...
متن کاملRestless Bandit Marginal Productivity Indices, Diminishing Returns, and Optimal Control of Make-to-Order/Make-to-Stock M/G/1 Queues
This paper presents a framework grounded on convex optimization and economics ideas to solve by index policies problems of optimal dynamic allocation of effort to a discrete-state (finite or countable) binary-action (work/rest) semi-Markov restless bandit project, elucidating issues raised by previous work. Its contributions include: (i) the concept of a restless bandit’s marginal productivity ...
متن کاملConservation laws ,
We show that if performance measures in stochastic and dynamic scheduling problems satisfy generalized conservation laws, then the feasible space of achievable performance is a polyhedron called an extended polymatroid that generalizes the usual polymatroids introduced by Edmonds. Optimization of a linear objective over an extended polymatroid is solved by an adaptive greedy algorithm, which le...
متن کامل